Improve performance 20%-110% #14

htmldoug · 2019-09-24T20:26:47Z

Summary

Adds simple benchmarks for VPackBuilder, VPackSlice, and VPackParser. Improves runtime performance by ~110% within VPackBuilder/VPackSlice. String => VPackSlice is dominated by json parsing and remains about the same.

Results

https://jmh.morethan.io/?sources=https://gist.githubusercontent.com/htmldoug/d292b301fdc3ec66cd7098e4ee54e965/raw/14e54eb2c9b153ba81931d16b276511eb3796f7f/jmh-result-before.json,https://gist.githubusercontent.com/htmldoug/d292b301fdc3ec66cd7098e4ee54e965/raw/14e54eb2c9b153ba81931d16b276511eb3796f7f/jmh-result-optimized.json

Context

I don't use arangodb, but I'm interested in using velocypack to reduce the heap usage of my json processing. I recently saw a 35 MB byte array get parsed to 270 MB of heap by play-json's JsValue. Case classes ate my RAM inspired backing the JsValue by a VPackSlice and it drops the heap usage to ~38 MB for the cost of about 33% more CPU time.

JMH + JFR + JMC revealed some some low-hanging fruit to cut the extra CPU time roughly in half for my use case. I also benched a real world example of String => VPackSlice from arrangodb/velocypack. The majority of time is spent on json parsing, although throughput still improved by 20%.

Optimizations

The gains come mostly from reducing UTF-8 encoding by string.getBytes() and replacing HashMap<Integer, T> lookups for sequential indexes with simple T[] arrays.

CLA

I've emailed a signed CLA to cla@arangodb.com.

OmarAyo · 2019-09-25T07:29:42Z

CLA Available

htmldoug · 2019-09-25T18:11:12Z

src/main/java/com/arangodb/velocypack/VPackStringSlice.java

+     * Index of the string bytes within {@link vpack},
+     * i.e. tag byte and length are somewhere before this index.
+     */
+    private int start;


I wasn't sure about this decision. Starting at the tag byte would make it more similar to VPackSlice, but it would add a bit more complexity to the compareToBytes operations. Those comparisons are on the hot path for the O(ln n) path queries and for sorting the offset table while constructing the object.

I couldn't think of a good way to make this class private, so it's probably worth giving it some thought before merging so we don't break compatibility later.

@mpv1989 @hkernbach What do you think?

rashtao · 2019-09-26T10:32:33Z

@htmldoug Thanks for contributing!

htmldoug · 2019-10-04T17:18:09Z

@rashtao any idea when you'll get to this or should I fork?

rashtao · 2019-10-07T16:33:02Z

@htmldoug thanks for the reminder, we will likely review it in the next weeks.

htmldoug · 2019-11-13T00:28:19Z

I'll give this another two weeks then probably close and fork.

siilike · 2019-11-15T14:20:07Z

It is sad to see that such great (and obvious!) performance improvements take so long to get accepted.

rashtao · 2019-11-19T18:40:08Z

@htmldoug I am reviewing the PR, but there are some points unclear to me:

the code has many warnings, eg. unused imports and Javadoc warnings
the code is not compatible with java 1.6, which is currently the base java version of the project
using -XX:+UnlockCommercialFeatures and JmhFlightRecorderProfiler is not compatible with our license and cannot be used with opensource software, at least until we will use jdk 11
running the benchmark I see no improvement for Bench.fromJson, actually it gets slightly worse (0.3% slower). In the other 2 benchmarks I get results similar to yours.

I would respectively suggest you to:

fix
set the minimum required java version to 1.7 for the project
remove -XX:+UnlockCommercialFeatures and JmhFlightRecorderProfiler
you can leave Bench.fromJson as is, so it will help us in the future

Also, upgrading to java 7 as lower required java version, would enable us to use ByteBuffers, thus avoiding Arrays.copy every time that we have to resize the buffers, thus achieving imho considerable performance improvements.

Thanks for contributing and sorry for the late reply!

htmldoug · 2019-11-30T01:43:41Z

the code has many warnings, eg. unused imports and Javadoc warnings

I added the -Werror compiler arg and fixed the warning. I couldn't find any Javadoc warnings. If they're still there, I'll need some guidance to repro.

set the minimum required java version to 1.7 for the project

done!

remove -XX:+UnlockCommercialFeatures and JmhFlightRecorderProfiler

Done. Replaced with: if (java.version >= 11) then jvmArgs += -XX:StartFlightRecording.

running the benchmark I see no improvement for Bench.fromJson, actually it gets slightly worse (0.3% slower). In the other 2 benchmarks I get results similar to yours.

Good observation. Looks like the results of that benchmark vary significantly between jvm forks. Setting @Fork(5), the results even out:

Before:
Benchmark                                    Mode  Cnt        Score       Error   Units
Bench.fromJson                               avgt  150        6.260 ±     0.089   ms/op
Bench.fromJson:·gc.alloc.rate                avgt  150      730.280 ±    11.476  MB/sec
Bench.fromJson:·gc.alloc.rate.norm           avgt  150  7171474.987 ± 16830.651    B/op
Bench.fromJson:·gc.churn.G1_Eden_Space       avgt  150      440.880 ±     9.614  MB/sec
Bench.fromJson:·gc.churn.G1_Eden_Space.norm  avgt  150  4329014.558 ± 62921.591    B/op
Bench.fromJson:·gc.churn.G1_Old_Gen          avgt  150        0.007 ±     0.010  MB/sec
Bench.fromJson:·gc.churn.G1_Old_Gen.norm     avgt  150       67.622 ±    94.356    B/op
Bench.fromJson:·gc.count                     avgt  150     1201.000              counts
Bench.fromJson:·gc.time                      avgt  150      693.000                  ms
Benchmark result is saved to target/jmh-result/2019-11-29_20-00-26.json

After:
Benchmark                                    Mode  Cnt        Score       Error   Units
Bench.fromJson                               avgt  150        6.237 ±     0.121   ms/op
Bench.fromJson:·gc.alloc.rate                avgt  150      704.050 ±    15.752  MB/sec
Bench.fromJson:·gc.alloc.rate.norm           avgt  150  6868573.344 ± 11200.920    B/op
Bench.fromJson:·gc.churn.G1_Eden_Space       avgt  150      413.042 ±    11.590  MB/sec
Bench.fromJson:·gc.churn.G1_Eden_Space.norm  avgt  150  4029680.844 ± 68520.798    B/op
Bench.fromJson:·gc.count                     avgt  150      917.000              counts
Bench.fromJson:·gc.time                      avgt  150      614.000                  ms
Benchmark result is saved to target/jmh-result/2019-11-29_20-06-51.json

alloc.rate.norm seems to have improved, but throughput is within the margin of error. Jackson dominates CPU time on this benchmark, so I guess the lack of improvement there shouldn't be surprising:

htmldoug · 2019-11-30T02:05:10Z

I think I've figured out and fixed the javadoc warnings.

rashtao

LGTM

htmldoug · 2020-01-08T02:50:16Z

Also, upgrading to java 7 as lower required java version, would enable us to use ByteBuffers, thus avoiding Arrays.copy every time that we have to resize the buffers, thus achieving imho considerable performance improvements.

Hey @rashtao, how would this work? I only have a light understanding of ByteBuffers, but my impression was that HeapByteBuffer and DirectByteBuffer were fixed capacity, i.e. not resizable. Are you picturing composing these into a composite like akka's ByteString? How were you envisioning this working?

rashtao · 2020-01-09T11:03:33Z

@htmldoug
I mean DirectByteBuffer would have higher write throughput than byte[] and not impact on GC. I have no experience with akka's ByteString, but I think also composing could lead to additional performance improvements, eg. something like Netty's CompositeByteBuf.

htmldoug · 2020-01-09T20:07:55Z

ensureCapacity() is producing a ton of non-TLAB allocations. Growing the array by 1.5x rather than the more typical 2x is causing a lot of churn.

I'm still not entirely sure how DirectByteBuffer would help, but a composite (at least within the builder) would be really good here. I started exploring that direction, but one barrier to implementing is that VPackBuilder.remove() is doing deletions via some left-shifting of the array that I haven't quite figured out yet and I ran out of time.

siilike · 2020-01-11T20:18:57Z

Why not just implement an OutputStream that stores everything in a List<byte[]>? Then we could have a toByteArray() method that copies the whole content to a single byte array (e.g. for Slice) and write* methods that are optimized to use the original byte arrays directly.

The only issue is the Builder.remove() method that needs some more attention.

For a more complex solution Netty has CompositeByteBuf which works similarly, but uses ByteBuffers for storage.

htmldoug · 2020-01-11T23:18:23Z

@siilike, yeah, that'd be an improvement and there's good precedent for that.

Dreaming about ideal state for a moment, I'd love the option to be able to back both VPackBuilder and VPackSlice entirely on disk for situations where I'm dealing with hundreds-of-megabytes json files that my apps sometimes receive. I haven't checked, but I'd be surprised if arangodb isn't already doing something similar.

This would require decoupling VPackBuilder/Slice from their backing implementation, which could be done by identifying all the operations currently used (appendByte/s(...), readByte/s(...), even leftShift()?), and stuffing them into some pluggable interface VPackStore {...}. That'd enable your ListByteVPackStore, the current baseline CompactArrayByteVPackStore, @rashtao's DirectByteBufferVPackStore, or a kafka-style RandomAccessFileVPackStore for my stupidly large inputs.

rashtao · 2020-01-13T16:08:14Z

I would suggest encapsulating all the buffer operations in one single class, call it eg. VPackSliceBuffer, and implementing there all the methods for manipulating the buffer. The first implementation can simply delegate the underlying byte array operations. Once we have it we can test and benchmark different implementations. We can even think about having different buffer implementations and leave to the user the possibility to choose which one to use.

Also I would move the discussion to https://arangodb-community.slack.com/archives/C5T51CW2J so we can get feedback and opinions from the community.

siilike · 2020-01-13T16:58:32Z

"Don't have an account on this workspace yet?
Contact the workspace administrator for an invitation"

rashtao · 2020-01-14T12:56:39Z

Sorry @siilike , you can register here: https://slack.arangodb.com/

htmldoug added 10 commits September 23, 2019 00:54

Add jmh.

0420d77

Add first benchmark.

d5b6043

Optimization: 5.735 => 5.489

05155de

VPackBuilder optimizations: 2830 => 1358 ns/op

84b0a8c

Fast path for ASCII 1358 => 946 ns/op

a02010c

985 => 896 ns/op

cdb7337

VPackSlice 1000 => 745 ns/op

2b3a5c7

VPackSlice 745 =>> 720 ns/op

877c05b

Revert ASCII-specific optimizations 852 => 1386 ns/op

2026a85

Remove unnecessary "throws VPackBuilderException"

7372b85

htmldoug changed the title ~~Performance Optimizations~~ Improve performance 20%-110% Sep 24, 2019

htmldoug commented Sep 25, 2019

View reviewed changes

rashtao self-assigned this Sep 26, 2019

siilike added a commit to siilike/java-velocypack that referenced this pull request Nov 19, 2019

Merge pull request arangodb#14

5e91dce

siilike mentioned this pull request Nov 19, 2019

Add tags support #15

Merged

htmldoug added 2 commits November 29, 2019 18:58

PR comments: avoid unchecked cast for generic array.

e211210

PR comments: Remove spf4j-jmh. Resolve licensing concerns.

9547676

PR comments: Fix javadoc things.

47d85af

rashtao self-requested a review December 2, 2019 11:48

rashtao approved these changes Dec 2, 2019

View reviewed changes

rashtao merged commit 6ba8cc9 into arangodb:master Dec 2, 2019

Improve performance 20%-110% #14

Improve performance 20%-110% #14

Uh oh!

Conversation

htmldoug commented Sep 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Context

Optimizations

CLA

Uh oh!

OmarAyo commented Sep 25, 2019

Uh oh!

htmldoug Sep 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rashtao commented Sep 26, 2019

Uh oh!

htmldoug commented Oct 4, 2019

Uh oh!

rashtao commented Oct 7, 2019

Uh oh!

htmldoug commented Nov 13, 2019

Uh oh!

siilike commented Nov 15, 2019

Uh oh!

rashtao commented Nov 19, 2019

Uh oh!

htmldoug commented Nov 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

htmldoug commented Nov 30, 2019

Uh oh!

rashtao left a comment

Choose a reason for hiding this comment

Uh oh!

htmldoug commented Jan 8, 2020

Uh oh!

rashtao commented Jan 9, 2020

Uh oh!

htmldoug commented Jan 9, 2020

Uh oh!

siilike commented Jan 11, 2020

Uh oh!

htmldoug commented Jan 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rashtao commented Jan 13, 2020

Uh oh!

siilike commented Jan 13, 2020

Uh oh!

rashtao commented Jan 14, 2020

Uh oh!

Uh oh!

htmldoug commented Sep 24, 2019 •

edited

Loading

htmldoug Sep 25, 2019 •

edited

Loading

htmldoug commented Nov 30, 2019 •

edited

Loading

htmldoug commented Jan 11, 2020 •

edited

Loading